Linking coronary artery disease to neurodegenerative diseases through systems genetics

Coronary artery disease (CAD) is still a leading cause of death worldwide despite the extensive research and the considerable progresses made through the years. As other cardiovascular diseases, CAD is the result of the complex interaction between genetic variants and environmental factors. Currently identified genetic loci associated to CAD revealed the contribution of multiple molecular pathways to its pathogenesis, suggesting the need for a systemic approach to understand the role of genetic determinants. In this study we wanted to investigate how GWAS variants associated to CAD interact with each other and with nearby genes in the context of the coronary artery molecular interactome. GWAS variants associated to CAD were selected from GWAS Catalog, then, a tissue-specific interactome was constructed integrating protein-protein interactions (PPI) from multiple public repositories and computationally inferred co-expression relationships. To focus on the part of the network most relevant for CAD, we selected the interactions connecting the genes carrying a variant associated to the disease. A functional enrichment analysis conducted on the subnetwork revealed that genes carrying genetic variants associated to CAD closely interact with genes related to relevant biological processes, such as extracellular matrix organization, lipoprotein clearance, arterial morphology and inflammatory response. These results confirm that the identified subnetwork reflects the molecular pathways altered in CAD and intercepted by the selected variants. Interestingly, the most connected nodes of the network included amyloid beta precursor protein (APP) and huntingtin (HTT), both implicated in neurodegenerative disorders. In recent years the interest in investigating the common processes between cardiovascular diseases and neurodegenerative disorders is increasing, with growing evidence of a link between CAD and Alzheimer’s disease. The results obtained in this work support the association between such apparently unrelated diseases and highlight the necessity of a systems biology approach to better elucidate shared pathological mechanisms.


Introduction
Coronary artery disease (CAD) is one of the prominent global contributors to mortality and has seen a steadily increasing in disability-adjusted life year (DALY) since 1990 (Roth et al., 2020), accounting for 8.9 million deaths and 164.0 million DALYs in 2015 (Ralapanawa and Sivakanesan, 2021).The prevalence of CAD varies among different countries, however, CAD related mortality is not a distinctive issue of high-income areas, since it is a major cause of death in countries from all income groups, with a steady increase in developing countries (Ralapanawa and Sivakanesan, 2021).In Western Europe, there were an estimated 9.1 million prevalent cases of CAD in 2015 (Roth et al., 2017).
This condition is characterized by the development of atherosclerosis within the coronary arteries, resulting in an impaired supply of nutrients and oxygen to the myocardium.Atherosclerosis is primarily an inflammatory disorder driven by the interaction between the arterial endothelium and a variety of risk factors, such as dyslipidemia, hypertension and proinflammatory cytokines derived by adipose tissue excess.The inflammation results in cell proliferation, lipids accumulation and extracellular matrix (ECM) production, thickening the arterial walls and forming the characteristic atherosclerotic plaque.These plaques progressively reduce blood flow to the heart muscle, leading to severe consequences, including angina, myocardial infarction (MI), arrhythmias, heart failure, and sudden cardiac death (Libby and Theroux, 2005;Erdmann et al., 2018).
As most cardiovascular diseases, CAD is the result of the complex interplay between environmental, behavioral and genetic factors.It is known that smoking, hyperlipidemia, hypertension, abdominal obesity, and diabetes are related to CAD and to an increased risk of MI (Yusuf et al., 2004;Barquera et al., 2015).Parental history of CAD was also proved to be an independent risk factor, demonstrating the existence of a genetic contribution to the disease (Myers et al., 1990;Marenberg et al., 1994).
Since their development, genome-wide association studies (GWAS) have been employed to investigate the genetic variants related to CAD.These studies were able to identify a large number of common variants with a significant association to the disease (Khera and Kathiresan, 2017;Erdmann et al., 2018).
The evaluation of the impact of such a complex landscape of associated loci and the identification of related pathological mechanisms could benefit from a network approach.In this study we want to investigate the role of variants associated to CAD in the context of a tissue-specific network, to explore how the corresponding genes interact with each other and with their neighbors.

CAD associated variants
Since the aim of the work is to explore CAD genetic variants with a systems biology approach, we retrieved the data available in GWAS Catalog, a public repository of manually curated SNP-trait associations derived from the literature (Sollis et al., 2023).To identify the variants associated with the disease of interest, we selected from the complete GWAS Catalog database only the SNPs associated to one of the following traits: "Coronary artery disease", "Coronary heart disease" and "Ischemic heart disease".
To select genetic variants impacting on the final protein product, we retained only missense and nonsense mutations.Without applying additional filters, this research returned results relative to a total of ten different studies carried out from 2011 to 2020, all of which include at least a thousand cases, except for Takeuchi et al.All variants reported have p-value <10 −5 .

Coronary artery interactome construction
The network specific for coronary artery tissue was constructed by integrating protein-protein interactions (PPIs) from multiple public repositories and computationally inferred co-expression relationships (Figure 1).
First, PPIs were gathered from three recently updated public repositories: STRING (Szklarczyk et al., 2019), that includes physical and functional associations from both high-throughput experiments and computational predictions; mentha (Calderone, Castagnoli and Cesareni, 2013), a manually curated collection of interactions from published literature; and IntAct (Orchard et al., 2014), that contains manually curated experimentally-derived PPIs.We collected all human-specific associations included in the selected databases and filtered them to keep only high confidence interactions, identified by the specific score provided by the single databases (STRING: >0.7, mentha: >0.5, IntAct: >0.45).Scores are computed using different approaches by different databases, however, each one represents the amount of evidence that supports the association in terms of quantity and quality of studies and experiments sourced by the database.
Additionally, we integrated PPIs with the multiple types of human specific associations available in the BioGRID database, a curated database that includes both genetic and protein interactions (Oughtred et al., 2021).We used gene expression data retrieved from the Genotype-Tissue Expression (GTEx) database (Carithers and Moore, 2013), a public repository collecting gene expression data from multiple non-diseased human tissues, to identify genes that appear to be expressed in coronary artery.Gene expression data included RNA-seq data for 240 samples of coronary artery tissue obtained from healthy human donors.Genes were considered as expressed if they had TPM > 0 in at least 80% of the samples included in the dataset.To ensure tissue specificity of the interaction network, we removed associations that included genes not selected as expressed in coronary artery.
Then, GTEx gene expression data were used to estimate the statistical dependencies between expression profiles of genes that included CAD-associated variants and the other genes present in the gene expression dataset.Correlations were computed using a measure of mutual information implemented in the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) (Margolin et al., 2006), an algorithm designed to computationally infer mammalian transcriptional networks from transcriptomics data.ARACNe estimates relationships between expression profiles of different genes as pair-wise mutual information, a measure that allows to capture also non-linear connections.Mutual information is described by the equation: where S(x) and S(y) are the entropy of a variable x and y, respectively, and S(x, y) is the joint entropy of the variables x and y, here representing the gene expression profiles.The statistical associations between genes were considered significant if p < 10 −15 (Percio et al., 2014) since significant thresholds for network construction are conventionally stricter than the standard p < 0.05 to account for the high number of connections (Margolin et al., 2006).

Network analysis
Since we were interested in how genes carrying a genetic variant associated to CAD interact with each other in the context of coronary artery interactome, we focused on the subnetwork constituted by the shortest paths connecting them with each other, as it highlights the core part of the graph responsible for the communication between the genes of interest.The shortest paths were defined as all the alternative paths connecting two selected nodes by intercepting the minimal number of vertices.
To select only the interactions included in the shortest paths connecting each pair of nodes constituted by the genes selected from GWAS Catalog, we applied an unweighted breadth-first search, implemented in the function all_shortest_paths included in the R package igraph (Csardi and Nepusz, 2006).The algorithm starts from a source node, explores its immediate neighbor nodes and progressively travels the network by moving to the next level neighbors.The algorithm stops when it reaches the target node.
A functional analysis was carried out on the genes included in the selected subnetwork to identify the biological processes associated to its nodes.The enrichment of gene sets from the hallmark (H), curated (C2) and ontology (C5) collections retrieved from MSigDB (Liberzon et al., 2011) was tested using an over-representation approach, that evaluates the overlap between each gene set and the set of genes of interest (Khatri et al., 2012).The resulting p-values were corrected for multiple testing using Benjamini and Hochberg procedure, and the significance threshold was set at FDR < 0.05.

Results
The identified subnetwork reflects key processes in CAD We constructed a tissue-specific heterogeneous network including protein-protein interactions, genetic interactions, and co-expression relationships to model the interactome of the coronary artery.The resulting network consisted of more than 800,000 associations, connecting 18,378 nodes.
To understand their role in the tissue-specific network, we selected the genetic variants reported to be associated to CAD from GWAS Catalog (Supplementary Table S1).The complete list of studies selected from GWAS Catalog is reported in Table 1.All the studies have a solid experimental design and an adequate number of cases and controls, including at least a thousand cases, except for a single study.The selected articles report variants from multiple ethnic groups, allowing to capture a more complete and heterogeneous view of the genetic landscape at the basis of CAD.The final list of variants reported by the selected studies included 40 unique genetic variants mapped to 36 distinct genes.All selected variants have SNP-trait p-value <10 −5 .The threshold used is looser than the conventional threshold for genetic studies since the strength of the network approach is the possibility to identify the pathways shared by the different variants, including those who would have a limited individual significance (Wang et al., 2010).
We located the 36 genes affected by at least one of the variants on the network and identified the shortest paths connecting them with each other.To focus on the portion of the network more relevant to CAD, we isolated the connected subnetwork constituted by the shortest paths, obtaining 4,383 nodes and 49,075 edges (Figure 2).Workflow of the study.

Frontiers in Genetics frontiersin.org 03
A functional enrichment analysis carried out on the subnetwork confirmed its association with biological processes known to be involved in the pathogenesis of the disease of interest (Figure 3, Supplementary Table S2).Significant terms included extracellular matrix organization, lipoprotein binding and regulation, inflammatory response, and arterial morphology.Additionally, 24 nodes of the subnetwork were included in the gene set HP_ CORONARY_ARTERY_ATHEROSCLEROSIS (FDR = 0.0012) from Human Phenotype Ontology of MSigDB.Most of these nodes (19 out of 24) were not included in the initial list of variants.

CAD genes are closely connected to genes related to neurodegenerative diseases
To investigate the pivotal nodes of the subnetwork, we focused on the 1% most connected hubs.The resulting list included 43 genes, and nine of them were not part of the GWAS variants genes (Figure 4).Interestingly, highly connected nodes included APP (amyloid beta precursor protein) and HTT (huntingtin), commonly associated to Alzheimer's disease and Huntington's disease, respectively.Additionally, we found multiple gene sets related to neurodegenerative disorders to be enriched in the selected subnetwork for CAD (Figure 5, Supplementary Table S3).This list included terms associated to Alzheimer's disease, amyloid fibrils formation and binding, neurodegeneration, and pathways in Huntington's disease.

Discussion
In this study, we employed a systems biology approach to investigate the role of genetic variants associated to CAD in the context of the coronary artery interactome and to characterize the genes that are closely connected to them.
First, we constructed a tissue-specific network integrating different types of interactions derived from public repositories and computationally inferred.High confidence PPIs were CAD connected subnetwork including the shortest paths connecting nodes affected by genetic variants associated to CAD (shown in yellow).

FIGURE 3
Barplot representation of the statistical significance of terms relevant to CAD pathological mechanisms found significant in the selected subnetwork.
Frontiers in Genetics frontiersin.org05 Vescio and Pattini 10.3389/fgene.2024.1344081gathered from STRING, mentha and IntAct, additionally, genetic and protein interaction data were collected from BioGRID database.To ensure tissue specificity, gene expression data from coronary artery tissue were obtained from GTEx database and employed to select only interactions that involved genes expressed in the tissue.Then, GTEx data were used to estimate statistical dependencies between variant genes and each other expressed gene in the dataset.
Then, we identified the portion of the coronary artery interactome more relevant for CAD by intercepting the nodes responsible for the communication between variants.Specifically, we selected the subnetwork constituted by the shortest paths connecting the genes that included at least one of the genetic variants associated to CAD in GWAS Catalog.
An enrichment analysis conducted on the subnetwork revealed its association with molecular mechanisms with a critical role in  Barplot representation of the statistical significance of terms related to neurodegenerative disorders found significant in the selected subnetwork.
Frontiers in Genetics frontiersin.org06 atherosclerosis.For instance, the genes included in the core subnetwork were involved in inflammatory response, a process known to contribute to the formation of the atherosclerotic plaque (Libby and Theroux, 2005;Erdmann et al., 2018).Additionally, the subnetwork shows an overrepresentation of genes associated with lipoprotein metabolism, such as lipoprotein assembly and clearance, lipoprotein levels regulation and regulation of cholesterol production.This further confirms the association between the genes in the subnetwork and key pathological mechanisms in CAD, since the accumulation of lipids in the vascular lumen is widely known as the trigger of the coronary plaque creation (Libby and Theroux, 2005;Erdmann et al., 2018).Another important process in the onset of atherosclerosis is proliferation and remodeling of ECM, leading to the thickening of arterial walls.Terms related to these mechanisms were also found significantly enriched in the subnetwork (Libby and Theroux, 2005;Erdmann et al., 2018).
These results, together with the overrepresentation inside the subnetwork of a term specific for coronary artery atherosclerosis, confirm that the systems biology approach proposed in this work is able to intercept genes relevant to the disease of interest that could have a role in the pathogenesis even in the absence of an established association with CAD.
In the second part of the study, we focused on the pivotal nodes of the subnetwork by selecting the most connected hubs.The list of the 1% most connected nodes included nine genes that were not part of CAD associated genes selected from GWAS Catalog, such as fibronectin leucine rich transmembrane protein 3 (FLRT3) and transmembrane protein 185A (TMEM185A), broadly expressed transmembrane proteins; titin (TTN), known for its involvement in dilated cardiomyopathy (Ware and Cook, 2018); WAS/WASL interacting protein family member 3 (WIPF3), a protein that seems to be involved in cytoskeletal dynamics (De Luca et al., 2023); protein S (PROS1), a protein responsible for the downregulation of thrombin generation and whose deficiency is associated with thromboembolism (Ten Kate and Van Der Meer, 2008); coiled-coil-helix-coiled-coil-helix domain containing 7 (CHCHD7) and golgin A2 (GOLGA2).
Interestingly, the most connected hubs also included amyloid beta precursor protein (APP), a single pass transmembrane protein expressed in the brain whose accumulation generates the characteristics plaques found in the brains of patients with Alzheimer's disease (O'Brien and Wong, 2011); and huntingtin (HTT), a disease gene linked to Huntington's disease, a monogenic neurological disorder characterized by the dysfunction and death of neurons (Bates et al., 2015).
The central role of genes linked to neurodegenerative diseases suggests a connection between the CAD-related subnetwork and neurological disorders, that was highlighted by the overrepresentation of genes linked to Alzheimer's disease, Huntington's disease, and neurodegeneration.
The link between neurodegenerative disorders and cardiovascular diseases has been under investigation in recent years, with an increasing amount of evidence of an association between heart failure, atrial fibrillation and CAD with an increased incidence of dementia and cognitive impairment (Tini et al., 2020;Zhao et al., 2020;Rivard et al., 2022).The nature of the relationship is still unclear; however, it seems to be maintained even after adjusting for confounding factors, suggesting that the correlation is not simply due to shared risk factors (Rivard et al., 2022).
The relationship between CAD and Alzheimer's disease is particularly relevant, since patients with severe atherosclerosis show a risk of developing Alzheimer's disease or vascular dementia that's three times higher with respect to the general population (Hofman et al., 1997).However, a previous study evaluating their shared genetic architecture failed to find evidence of a causal relevance of CAD for risk of Alzheimer's disease with the exception of the APOE locus (Grace et al., 2018).One possible explanation of this association is the reduction or impairment of cerebral perfusion caused by vascular damage, arterial stenosis and increased arterial stiffness (De La Torre, 2004;Tini et al., 2020).
Furthermore, recent studies suggest that the deposition of amyloid beta (Aβ) peptides, proteolytic fragments of APP, could be the overlapping molecular mechanism between Alzheimer's disease and CAD.Accumulation of Aβ peptides, besides being an established characteristic of the brain of Alzheimer's disease patients, was also reported in atherosclerotic plaques.Plasma levels of Aβ1-40 were associated with changes in aortic stiffness, higher carotid intima-media thickness and the severity and extent of arterial damage.Moreover, circulating levels of Aβ1-40 were correlated with the presence of angiographically documented CAD in multiple cohorts (Stakos et al., 2020).
The subnetwork identified in this study supports this hypothesis, since APP has a central role as a highly connected hub and the nodes in the subnetwork are strongly enriched of genes involved in the regulation of amyloid beta fibrils formation, Aβ clearance, and catabolic processes of APP.
In conclusion, this work suggests that the application of a systems biology approach can be a valuable tool for the exploration of shared molecular processes between CAD and Alzheimer's disease, beyond the analysis of single genetic variants (Grace et al., 2018).The employment of a network approach is able to intercept relevant genes without an explicit genetic association with the disease that could be helpful in elucidating interaction mechanisms and discovering new associations with CAD.

FIGURE 1
FIGURE 1 FIGURE 4 (A) Scatterplot displaying degree and betweenness centrality for each node of the CAD subnetwork.Hubs in the top 1% connected nodes are shown with color filled circles, yellow if they are part of the CAD GWAS variants, blue otherwise.(B) List of the hubs not included in CAD GWAS variants [blue in (A)].

TABLE 1
List of articles reporting CAD-associated variants selected from GWAS Catalog.